2022-01-20

Plan for today

  • Object oriented programming: a practical guide
  • Graphics in R

So, have you ever wondered…

How do you add objects other than numeric values?

ggplot(mtcars, aes(x=mpg, y=hp)) + geom_point()

So, have you ever wondered…

What this may mean?

print
## function (x, ...) 
## UseMethod("print")
## <bytecode: 0x5646850e3870>
## <environment: namespace:base>

What is this UseMethod() thing?

Object oriented programming

  • classes of objects
  • the same operator or function (“method”) has different effects on different objects
  • for example, there are different print and summary functions called depending on the type of the object to print or summarize
  • In the OO context, a function is called a “method”
  • Common methods such as print, summary are called “generics”

R: S3 and S4

R implements two frameworks for OO programming:

  • S3: easy to use, easy to understand, simple, informal, but elegant
  • S4: complex, formalized, safer
  • R6: yet another approach (also quite formalized), very similar to other programming languages.

All three are used in parallel. For example, BioConductor mostly uses S4, while tidyverse and base R only use S3.

Example: print

class(starwars)
print(starwars)
print.data.frame(starwars)
tibble:::print.tbl(starwars)

Although a tibble is also a data frame, the first class (tbl) takes precedence and it is displayed with the function from the tibble package.

What happens under the hood

  • When R sees that starwars is an object of class tbl, and we call a function print on that object, it first looks for a function called print.tbl.
  • Although we cannot see print.tbl, because it is not attached to our namespace, it has been loaded with the tibble package and R can see it.
  • If such a function is found, it will be used.
  • Otherwise, another class will be attempted
  • If everything else fails, the default print method (print.default) will be used

It is super easy to define your own generic methods!

v1 <- "blabla"
## add a class, not replace
class(v1) <- c("bulba", class(v1))
print.bulba <- function(x, ...) {
  cat(paste0("An object of class bulba:\n", x, "\n"))
}
v1
## An object of class bulba:
## blabla

Note that we add a new class, not replace it. That way, any generics that we have not defined for our new class will simply fall back to whatever is defined for the existing classes.

It is also easy to define your own generics

nonsense <- function(x, ...) { UseMethod("nonsense", x) }
nonsense.default <- function(x, ...) {
  cat("Oh well, not a bulba then.\n")
}
nonsense.bulba <- function(x, ...) {
  cat(paste("This", x, "is nonsense!\n"))
}
nonsense(v1)
## This blabla is nonsense!
nonsense(pi)
## Oh well, not a bulba then.

But it gets better

Remember that everything is a function?

We can define generic operators to work on our class!

v1 <- "a"
class(v1) <- c("bulba", class(v1))
v2 <- "b"
class(v2) <- c("bulba", class(v2))

`+.bulba` <- function(a, b) {
  ret <- paste0(a, b)
  class(ret) <- "bulba"
  return(ret)
}

v1 + v2
## An object of class bulba:
## ab

And this is how ggplot works.

g1 <- ggplot(data=mtcars, aes(x=disp, y=hp, color=mpg)) +
  geom_point(size=5) + scale_color_viridis_c()
class(g1)
## [1] "gg"     "ggplot"
methods(class="gg")
## [1] +
## see '?methods' for accessing help and source code
methods(class="ggplot")
## [1] as_grob      get_alt_text ggplot_build plot         print        summary     
## see '?methods' for accessing help and source code
methods(print) %>% { .[grep("ggplot", .)] }
## [1] "print.ggplot"       "print.ggplot2_bins"

A practical example

I have created the colorDF package which defines a new colorDF class, which is basically your regular data frame or tibble plus some additional attributes which tell colorDF methods how to display the object using terminal colors.

library(colorDF)
mtcars %>% as.colorDF()

Which methods have been defined for the colorDF class?

methods(, "colorDF")
colorDF:::print.colorDF

print.colorDF / print_colorDF

  • is not exported directly from package (instead, there is a function called print_colorDF).
  • displays by default only a number of rows (this can be changed with the option colorDF_n)
  • colors depend on the theme (option colorDF_theme)
  • option colorDF_tibble_style means that we only display as many columns as will fit on the screen

A practical example

Fine, but I liked my output so much that I wanted every data frame to be displayed with print.colorDF. For this, I put the following in my .Rprofile:

print.data.frame <- colorDF:::print.colorDF
print.tbl        <- colorDF:::print.colorDF
print.data.table <- colorDF:::print.colorDF

A practical example

The package S4Vectors defines an S4 class called DataFrame. I wanted my function to be able to display also DataFrame objects created by e.g. DESeq2. For this, I put the following in my .Rprofile:

setMethod("show", "DataFrame", function(object) { colorDF:::print.colorDF(object) })

This is the S4 way of setting the default print (show) method for an object of S4 class DataFrame.

A practical example

Also, I have implemented the method summary.colorDF (exported as summary_colorDF). Likewise, I have set up R to use it with the different data frame like objects in my .Rprofile:

summary.data.frame <- colorDF:::summary.colorDF
summary.tbl <- colorDF:::summary.colorDF
summary.data.table <- colorDF:::summary.colorDF

(this part will not work for results of DESeq2; they are of class DESeqResults, which inherits from DataFrame, and have their own method defined for displaying a summary. Also, summary_colorDF does not yet understand DataFrame objects).

OK, what about the attributes

I told you that a tibble is essentially a data frame:

class(starwars)

So what changes when we add a grouping variable?

OK, what about the attributes

attributes(starwars)
sw <- starwars %>% group_by(homeworld)
attributes(sw)

Exercise 4/0

Principles of data presentation

Minard

Edward Tufte

“Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.”

Edwart Tufte – Books

Less is more

Data visualization is all about communication.

Just like in graphics design, less is more. To get a good graphics remove all excess ink.

Checklist for making graphs

  • What do I want to say?
  • What do I need to say?
  • What part of my information is redundant?
  • What is the standard way of displaying the information in my field?

Resist the temptation of showing every bit of data. If necessary, put it in the supplementary materials.

Average MPG depending on number of cylinders

p <- mtcars %>% group_by(cyl) %>% 
      summarise(mean_mpg=mean(mpg)) %>%
      mutate(cyl=factor(cyl)) %>% 
      ggplot(aes(x=cyl, y=mean_mpg, fill=cyl))
p + geom_bar(stat="identity", mapping=aes(fill=cyl)) + 
  theme(axis.line=element_line(size=1, arrow=arrow(length=unit(0.1, "inches"))))

All bells and whistles

“Clutter and confusion are failures of design, not attributes of information.” (Tufte)

Remove legend

Remove axes

Remove color

Narrow bars

Remove vertical grid

Remove grey background

Add meaningful labels

Box plots: default R

boxplot(hwy ~ class, data=mpg)

Box plots: ggplot2

mpg %>% ggplot(aes(x=class, y=hwy)) + geom_boxplot()

Box plots: ggplot2

Box + scatter plots

mpg %>% ggplot(aes(x=class, y=hwy)) + geom_boxplot() +
    geom_dotplot(binaxis="y", stackdir="center", fill="grey", dotsize=.3)

Box + scatter plots

Box plots: Tufte

mpg %>% ggplot(aes(x=class, y=hwy)) + geom_boxplot() +
    geom_dotplot(binaxis="y", stackdir="center", fill="grey", dotsize=.3)

Box plots: Tufte

Scatter plot variants

Scatter plot variants

p <- list()
p$p1 <- ggplot(mtcars, aes(x=disp, y=hp, color=factor(cyl))) + geom_point() 
p$p2 <- ggplot(mtcars, aes(x=disp, y=hp, color=factor(cyl))) + geom_point() + 
  theme_par()
p$p3 <- ggplot(mtcars, aes(x=disp, y=hp, color=factor(cyl))) + geom_point() + 
  theme_cowplot()
p$p4 <- ggplot(mtcars, aes(x=disp, y=hp, color=factor(cyl))) + geom_point() + 
  theme_tufte()

p <- map(p, ~ . + theme(plot.margin=margin(20, 0, 0, 0)))
plot_grid(plotlist=p, labels=c("Default", "Par", "Cowplot", "Tufte"))

“Above all else show the data.” (Tufte)

Common problems and solutions

Avoid bar charts

  • Bar charts have their purpose: showing proportions or absolute quantities (1 value per bar)
  • Y axis must always start at 0, because bar charts communicate with the bar surface area
  • Bar charts are often misused to show sample means and sample spread; they should be replaced by box plots, violin plots or dot plots.

(demo)

Editorial. “Kick the bar chart habit.” Nature Methods 11 (2014): 113.

Avoid pie charts

  • Pie charts are bad at communicating information, just don’t use them
  • Don’t even mention 3D pie charts
  • There are tons of alternatives to pie charts

Avoid pie charts

Avoid pie charts

Avoid pie charts

Avoid pie charts

Avoid pie charts

Avoid pie charts

Avoid pie charts

Graphics systems in R

Important message

It is not important which system you use. It is important that you first come up with the idea how you want the data to be plotted, and that you can plot it – with whatever means you can. (Where should you look for a lost watch?)

Graphics systems in R

  • graphic devices (PDF, SVG, PNG, …)
  • basic R: plotting primitives, par() system
  • rgl (3D graphics): 3D plotting primitives
  • plotrix: a collection of plotting functions
  • grid: an alternative system of plotting primitives
  • lattice: based on grid, sophisticated plotting
  • ggplot2: based on grid, sophisticated plotting

Graphics systems in R

Graphics in basic R: Pros

  • easy learning curve
  • superb for quick and dirty plots
  • relatively easy to add arbitrary elements to the plots
  • widely spread, always there for you
  • many specialized packages doing what otherwise cannot be easily achieved

Graphics in basic R: Cons

  • messy
  • not standardized
  • once you put something on the plot, it stays here
  • some things are incredibly hard to achieve
  • by default ugly as sin

Graphics in ggplot2: Pros

  • well thought through and organized
  • easy to modify a ready made plot
  • dozens of different plot types with a similar interface
  • tons of ggplot2 based packages
  • widely spread

Graphics in ggplot2: Cons

  • steeper learning curve, completely different philosophy
  • I for one have always to look up even simplest things
  • some things are incredibly hard to achieve
  • if it’s not there, you are on your own
  • huge data sets problematic (require huge data frames -> use tibble)
  • by default ugly as sin

Things that basic R sucks at

  • changing the overall esthetics of the plot easily
  • modifying certain parameters (like text)
  • making facets

Things that ggplot2 sucks at

  • networks (use igraph or graphviz for that)
  • identifying points on a plot, interactive graphics
  • 3D graphics
  • slooooooow

ggplot2 overview

  • mapping between variables and an “esthetics”
    • x, y
    • color, fill
    • symbol
    • ymin/ymax, xmin/xmax (error bars)
  • geom (points, segments, bars, whatever)
  • guides (axes, legends)
  • theme (specific look of things)

Simple plot

data(mpg)
ggplot(mpg, aes(x=hwy, y=cty)) + geom_point()

Example session

We will now use world inequality data to create a bar plot.

First, we prepare the data using tidyverse.

wid <- read_excel("../Datasets/WIID_19Dec2018.xlsx")
wid <- wid %>% drop_na(gini_reported, q1:q5, d1:d10)
wid2015 <- wid %>% filter(year==2015 & 
                          region_un == "Europe" & 
                          population > 5e6)
wid2015sel <- wid2015  %>% 
  filter(quality=="High") %>%
  filter(!duplicated(country)) %>% 
  select(country, gini_reported, q1:q5, d1:d10)

## we mess the quantiles on purpose
data <- wid2015sel %>% 
  gather(q1:q5, key="quantile", value="proportion") %>%
  mutate(quantile=factor(quantile, levels=paste0("q", c(2, 1, 5, 4, 3))))

Example session

Now we pass the data to ggplot.

p <- data %>%
  ggplot(aes(country, proportion, fill=quantile)) + geom_bar(stat="identity") 
p
  • geom_bar() uses the fill esthetics
  • stat="identity" means that the bar plot height has already been calculated

Example session

Now we pass the data to ggplot.

p <- data %>%
  ggplot(aes(country, proportion, fill=quantile)) +
  geom_bar(stat="identity") + coord_flip()
p
  • coord_flip() so the bar plot is horizontal

Example session

First, reorder the quantile factor

data <- data %>% mutate(quantile=factor(quantile, levels=paste0("q", 5:1)))
p <- data %>%
  ggplot(aes(country, proportion, fill=quantile)) +
  geom_bar(stat="identity") + coord_flip()
p

Reorder the countries

data <- wid2015sel %>% 
  mutate(country=reorder(country, desc(gini_reported))) %>%
  gather(q1:q5, key="quantile", value="proportion") %>%
  mutate(quantile=factor(quantile, levels=paste0("q", 5:1)))
p <- data  %>%
  ggplot(aes(country, proportion, fill=quantile)) +
  geom_bar(stat="identity") + coord_flip()
p

Make it nice!

p + theme_tufte() + scale_fill_brewer(palette="Blues") +
  ylab("Proportion of wealth") + xlab("Country") +
  guides(fill=guide_legend(reverse=TRUE))

Make it nice!

Exercise 4/1

Eine kleine Farbenlehre

Farbenlehre (Color theory)

  • What is the function of color on the plot?
  • Does the color help or distract?
  • Do I really need color?
  • If you need more than five distinct colors (I don’t mean a gradient), you probably are doing something wrong.

Representing colors

There are many ways to represent colors. In R, we most frequently use the RGB scheme in which each color is composed of three values for each of the three colors: red, green and blue.

One way is to choose values between 0 and 1; another, between 0 and 255. The latter can be represented using hexadecimal notation, in which the value goes from 0 to FF (15 * 16 + 15 = 255). This is a very common notation, used also in HTML:

  • "#FF0000" or c(255, 0, 0): red channel to the max, blue and green to the minimum. The result is color red.
  • "#00FF00": bright green
  • "#000000": black
  • "#FFFFFF": white

Getting the colors

  • To get the color from numbers in 0…1 range:

    rgb(0.5, 0.7, 0) # returns “#80B300”

  • To get the color from numbers in 0…255 range:

    rgb(255, 128, 0, maxColorValue=255)

Alpha channel: transparency

Useful way to handle large numbers of data points. #FF000000: fully transparent; #FF0000FF: fully opaque.

x <- rnorm(10000)
y <- x + rnorm(10000)
p1 <- ggplot(NULL, aes(x=x, y=y)) + geom_point() + 
  theme_tufte() + theme(plot.margin=unit(c(2,1,1,1), "cm"))
p2 <- ggplot(NULL, aes(x=x, y=y)) + geom_point(color="#6666661F") + 
  theme_tufte() + theme(plot.margin=unit(c(2,1,1,1),"cm"))
plot_grid(p1, p2, labels=c("Black", "#6666661F"))

Alpha channel: transparency

Useful way to handle large numbers of data points. #FF000000: fully transparent; #FF0000FF: fully opaque.

Other color systems

There are several other representations of color space, and they do not give exactly the same results. Two common representations are HSV and HSL: Hue, Saturation and Value, and Hue, Saturation and Luminosity.

Manipulating colors

There are many packages to help you manipulate the colors using hsl and hsv. For example, my package plotwidgets allows you to change it using the HSL model.

library(plotwidgets)
## Now loop over hues
pal <- plotPals("zeileis")
v <- c(10, 9, 19, 9, 15, 5)

a2xy <- function(a, r=1, full=FALSE) {
  t <- pi/2 - 2 * pi * a / 360
  list( x=r * cos(t), y=r * sin(t) )
}

plot.new()
par(usr=c(-1,1,-1,1))
hues <- seq(0, 360, by=30)
pos <- a2xy(hues, r=0.75)
for(i in 1:length(hues)) {
  cols <- modhueCol(pal, by=hues[i])
  wgPlanets(x=pos$x[i], y=pos$y[i], w=0.5, h=0.5, v=v, col=cols)
}

pos <- a2xy(hues[-1], r=0.4)
text(pos$x, pos$y, hues[-1])

Manipulating colors

There are many packages to help you manipulate the colors using hsl and hsv. For example, my package plotwidgets allows you to change it using the HSL model.

Palettes

It is not easy to get a nice combination of colors (see default plot in ggplot2 to see how not to do it).

There are numerous palettes in numerous packages. One of the most popular is RColorBrewer. You can use it with both base R and ggplot2.

RColorBrewer palettes

library(RColorBrewer)
par(mar=c(0,4,0,0))
display.brewer.all()

RColorBrewer palettes

RColorBrewer palettes: color blind

par(mar=c(0,4,0,0))
display.brewer.all(colorblindFriendly=T)

RColorBrewer palettes: color blind

Iris data set

data("iris")

The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis. Fisher 1936

How to use RColorBrewer with base R

pal <- brewer.pal(3, "Dark2")
iris$Species <- factor(iris$Species)
cols <- pal[ iris$Species ]
plot(iris$Sepal.Length, iris$Sepal.Width, col=cols, pch=19,
  xlab="Sepal length", ylab="Sepal width", bty="n", cex=1.5)
legend("topright", levels(iris$Species), col=pal, pch=19, bty="n")

How to use RColorBrewer with base R

RColorBrewer and ggplot

You can easily use ggplot with RColorBrewer palettes:

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + 
  geom_point(size=4) + 
  scale_color_brewer(palette="Dark2") + 
  theme_tufte() + 
  theme(axis.title.y=element_text(margin=margin(0,10,0,0)), 
        axis.title.x=element_text(margin=margin(10, 0, 0, 0)))

Gallery of RColorBrewer palettes: Dark2

Pastel1

Paired

Set2

The viridis scale

For base R, use the following code:

library(scales)
pal <- viridis_pal()(n=6)
show_col(pal)

The viridis scale

Implemented in ggplot functions:

  • scale_(color|fill)_viridis_(c|d)
  • c for continuous, d for discrete

e.g. 

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + 
  geom_point(size=4) + scale_color_brewer(palette="Set2") + theme_tufte() + 
  theme(axis.title.y=element_text(margin=margin(0,10,0,0)), 
        axis.title.x=element_text(margin=margin(10, 0, 0, 0)))

The viridis scale

Other sources

  • my package plotwidgets implements a couple of other palettes
  • You can always define your own colors!
  • Use a color picker to “steal” palettes that you think are nice
  • You can use this tool or this one to design colorblind friendly palettes

Using manual (or plotwidgets) palettes in ggplot2

par(mar=c(0,4,0,0))
library(plotwidgets)
showPals()

scale_color_manual

pal <- plotPals("darkhaze")
pal
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + 
  geom_point(size=4) + scale_color_manual(values=pal) + theme_tufte() + 
  theme(axis.title.y=element_text(margin=margin(0,10,0,0)), 
        axis.title.x=element_text(margin=margin(10, 0, 0, 0)))

scale_color_manual

Continuous scales

In base R, we can use colorRampPalette()

pal_func <- colorRampPalette(c("cyan", "black", "purple"))
pal <- pal_func(15)
pal
##  [1] "#00FFFF" "#00DADA" "#00B6B6" "#009191" "#006D6D" "#004848" "#002424" "#000000" "#160422" "#2D0944" "#440D66" "#5B1289" "#7216AB" "#891BCD" "#A020F0"

Continuous scales

In ggplot, there is a number of continuous scales available.

  • scale_(color|fill)_viridis_c (viridis color scale)
  • scale_(color|fill)_gradient (two colors)
  • scale_(color|fill)_gradient2 (three colors)

The problem is with exactly defining the break points (which value corresponds to which color?)

Exercise 4/2

Comparing

ggplot2 tips and tricks

Resources

qplot: gentle introduction to ggplot

qplot is an interface to ggplot which uses a syntax similar to the basic plot function.

Working with themes

The theme() and theme_*() functions return an object of the class theme which can be added to a ggplot in order to change appearance of several elements. The list of the elements you can theme can be found in the theme() help page. You can add themes. The result is again a theme object that you can reuse and even set as default. This makes it easy to create your own themes.

Working with themes

Common uses of themes

## set the theme to something nicer
theme_set(theme_minimal())

g1 <- ggplot(iris, aes(x=Sepal.Length,y=Sepal.Width,color=Species)) + 
  geom_point()
## Remove legend title
g1 + theme(legend.title = element_blank())

## Remove legend entirely
g1 + theme(legend.position = "none")

logarithmic scaling

In base R:

plot(...., log="xy") # to scale both axes

In ggplot2:

ggplot(data, aes(...)) + ... + scale_x_log10

ggrepel

To avoid labels which are overlaping, we can use the ggrepel package.

library(ggrepel)
g1 <- ggplot(iris, aes(x=Sepal.Length,y=Sepal.Width,color=Species)) + 
  geom_point()
g1 + geom_label_repel(aes(label=Petal.Width))

Cowplot

There are two important functions in cowplot: predefined theme_cowplot(), which is quite nice, and plot_grid(), which rocks. plot_grid allows you to create separate plots and combine them in a number of ways. You can even draw a plot in basic R, record it and include it in your plot_grid call!

facet_grid

You can get a lattice-like representation using facet_grid() function. For example:

ggplot(mpg, aes(cty, hwy)) + geom_point() + facet_grid(rows=mpg$cyl)

Density plots

data(mpg)
ggplot(mpg, aes(cty, fill=factor(cyl))) +
    geom_density(alpha=0.8) +
    labs(title="Density plot",
         subtitle="City Mileage Grouped by Number of cylinders",
         caption="Source: mpg",
         x="City Mileage",
         fill="# Cylinders")

Density plots

Correlograms

library(ggcorrplot)

# Correlation matrix
data(mtcars)
corr <- round(cor(mtcars), 1)

# Plot
ggcorrplot(corr, hc.order = TRUE, 
           type = "lower", 
           lab = TRUE, 
           lab_size = 3, 
           method="circle", 
           colors = c("tomato2", "white", "springgreen3"), 
           title="Correlogram of mtcars", 
           ggtheme=theme_bw)

Correlograms

Animations

#  ```{r, animation.hook="gifski"}
#  for (i in 1:2) {
#    pie(c(i %% 2, 6), col = c('red', 'yellow'), labels = NA)
#  }
#  ```

Exercise 4/3

The gapminder data set

From data to figures

  • What is the message of the figure?
  • What data should be shown on a plot?
  • What relationships do you want to illustrate?
  • If it is tricky – I start with a pencil and a clean piece of paper!

Factfulness

“The world cannot be understood without numbers. But the world cannot be understood with numbers alone.”

― Hans Rosling, Factfulness: Ten Reasons We’re Wrong About the World—and Why Things Are Better Than You Think

Loading the gapminder data set

library(ggplot2)
theme_set(theme_bw())
library(gapminder)
knitr::kable(head(gapminder))
country continent year lifeExp pop gdpPercap
Afghanistan Asia 1952 28.801 8425333 779.4453
Afghanistan Asia 1957 30.332 9240934 820.8530
Afghanistan Asia 1962 31.997 10267083 853.1007
Afghanistan Asia 1967 34.020 11537966 836.1971
Afghanistan Asia 1972 36.088 13079460 739.9811
Afghanistan Asia 1977 38.438 14880372 786.1134

First plot

gapminder %>% ggplot(aes(x=gdpPercap, y=lifeExp, color=year)) + geom_point()

First plot

gapminder %>% ggplot(aes(x=gdpPercap, y=lifeExp, color=year)) + geom_point() +
  scale_x_log10()

First plot

gapminder %>% ggplot(aes(x=gdpPercap, y=lifeExp, color=year)) + geom_point() +
  scale_x_log10() + scale_color_viridis_c()

Select only one year

gapminder %>% filter(year==2007) %>%
  ggplot(aes(x=gdpPercap, y=lifeExp, color="continent")) + geom_point() +
  scale_x_log10() + scale_color_brewer(palette="Dark2")

Add population data

gapminder %>% filter(year==2007) %>%
  ggplot(aes(x=gdpPercap, y=lifeExp, size=pop, color="continent")) + geom_point() +
  scale_x_log10() + scale_color_brewer(palette="Dark2")

Nicer colors from gapminder

gapminder %>% filter(year==2007) %>%
  ggplot(aes(x=gdpPercap, y=lifeExp, size=pop, color=country)) + 
  geom_point(alpha=.7, show.legend=FALSE) +
  scale_color_manual(values=country_colors) +
  scale_x_log10()

Comparison year 1952 and 2007

g1952 <- gapminder %>% filter(year == 1952) %>%
  ggplot(aes(x=gdpPercap, y=lifeExp, color=continent)) + geom_point() +
  scale_color_brewer(palette="Dark2") +
  xlim(range(gapminder$gdpPercap)) + 
  ylim(range(gapminder$lifeExp)) + scale_x_log10()

g2007 <- gapminder %>% filter(year == 2007) %>%
  ggplot(aes(x=gdpPercap, y=lifeExp, color=continent)) + geom_point() +
  scale_color_brewer(palette="Dark2") +
  xlim(range(gapminder$gdpPercap)) + 
  ylim(range(gapminder$lifeExp)) + scale_x_log10()

plot_grid(g1952, g2007)

Comparison year 1952 and 2007

Comparison year 1952 and 2007

Much easier!

gapminder %>% filter(year %in% c(1952, 2007)) %>%
  ggplot(aes(x=gdpPercap, y=lifeExp, color=continent)) +
  scale_color_brewer(palette="Dark2") +
  geom_point() + facet_grid(. ~ year) + scale_x_log10()

Add population data

gapminder %>% filter(year %in% c(1952, 2007)) %>%
  ggplot(aes(x=gdpPercap, y=lifeExp, size=pop, color=continent)) +
  scale_color_brewer(palette="Dark2") +
  geom_point() + facet_grid(. ~ year) + scale_x_log10()

Slope diagram

tmp <- gapminder %>% filter(year %in% c(1952, 2007)) %>% 
  group_by(continent, year) %>% 
  summarise(mean=mean(gdpPercap), median=median(gdpPercap))
tmp %>% ggplot(aes(x=year, y=mean, color=continent)) + 
  geom_point() + geom_line() + 
  scale_y_log10() + 
  geom_label(aes(label=continent), hjust="outward", show.legend=F) + xlim(1945, 2020)

Slope diagram

Dumbbell chart

gapminder %>% filter(year %in% c(1952, 2007) & continent=="Europe") %>% 
  arrange(gdpPercap, year) %>% 
  mutate(country=factor(country, levels=unique(country))) %>%
  ggplot(aes(x=gdpPercap, y=country, color=year)) + 
  geom_point() + geom_line()

Dumbbell chart

Let’s move it

library(gganimate)
g <- gapminder %>% ggplot(aes(x=gdpPercap, y=lifeExp, size=pop, color=continent)) + 
  geom_point(alpha=.8) + 
  scale_color_brewer(palette="Dark2") +
  scale_x_log10() + 
  scale_size(range = c(2, 12)) +
  transition_time(year) + 
  labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'life expectancy') +
  ease_aes("linear")

animate(g, duration = 15, fps = 20, width = 800, height = 500, renderer = av_renderer())
anim_save("gapminder.mp4")

Warning: gganimate has huge installation requirements, because you need a renderer library. Depending on your system, this might take a lot of disk space / a lot of headache. For example, using the gifski package requires you to install the rust environment. Also, including in rmarkdown might be problematic.

Let’s move it